Towards Universal Web Parsebanks

نویسندگان

  • Juhani Luotolahti
  • Jenna Kanerva
  • Veronika Laippala
  • Sampo Pyysalo
  • Filip Ginter
چکیده

Recently, there has been great interest both in the development of cross-linguistically applicable annotation schemes and in the application of syntactic parsers at web scale to create parsebanks of online texts. The combination of these two trends to create massive, consistently annotated parsebanks in many languages holds enormous potential for the quantitative study of many linguistic phenomena, but these opportunities have been only partially realized in previous work. In this work, we take a key step toward universal web parsebanks through a single-language case study introducing the first retrainable parser applied to the Universal Dependencies representation and its application to create a Finnish web-scale parsebank. We further integrate this data into an online dependency search system and demonstrate its applicability by showing linguistically motivated search examples and by using the dependency syntax information to analyze the language of the web corpus. We conclude with a discussion of the requirements of extending from this case study on Finnish to create consistently annotated web-scale parsebanks for a large number of languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dep_search: Efficient Search Tool for Large Dependency Parsebanks

We present an updated and improved version of our syntactic analysis query toolkit, dep search, geared towards morphologically rich languages and large parsebanks. The query language supports complex searches on dependency graphs, including for example boolean logic and nested queries. Improvements we present here include better data indexing, especially better database backend and document met...

متن کامل

SETS: Scalable and Efficient Tree Search in Dependency Graphs

We present a syntactic analysis query toolkit geared specifically towards massive dependency parsebanks and morphologically rich languages. The query language allows arbitrary tree queries, including negated branches, and is suitable for querying analyses with rich morphological annotation. Treebanks of over a million words can be comfortably queried on a low-end netbook, and a parsebank with o...

متن کامل

Self-Adapting Web-based Systems: Towards Universal Accessibility

This paper discusses the employment of self-adaptation techniques in WWW-based interactive systems, as a tool for ensuring their universal accessibility. The paper first elaborates on the underpinnings of universal accessibility and their relevance to Web applications and services. Then it provides a contextual definition of self-adapting systems and an account of how self-adaptation relates to...

متن کامل

Towards Universal Web Service Clients

In this paper we propose the Web Service Description Framework (WSDF), a suite of tools for the semantic annotation and invocation of side effect free Web Services. WSDF bases on established standards from both the Web Services and the Semantic Web communities. We developed a software package on top of Apache AXIS and a relational database server that allows an application to invoke a service w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015